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Binary search algorithm 


Anthony Lin?* et al. 


Abstract 


In In computer science, binary search, also known as half-interval search, logarithmic search, or binary 
chop,"3! is a search algorithm that finds a position of a target value within a sorted array."! Binary search compares 
the target value to an element in the middle of the array. If they are not equal, the half in which the target cannot 
lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to 
the target value, and repeating this until the target value is found. If the search ends with the remaining half being 
empty, the target is not in the array. 


Binary search runs in logarithmic time in the worst case, making O(log n) comparisons, where n is the number of 
elements in the array, the O is ‘Big O’ notation, and log is the logarithm."5! Binary search is faster than linear search 
except for small arrays. However, the array must be sorted first to be able to apply binary search. There are spe- 
cialized data structures designed for fast searching, such as hash tables, that can be searched more efficiently than 
binary search. However, binary search can be used to solve a wider range of problems, such as finding the next- 
smallest or next-largest element in the array relative to the target even if it is absent from the array. 


There are numerous variations of binary search. In particular, fractional cascading speeds up binary searches for 
the same value in multiple arrays. Fractional cascading efficiently solves a number of search problems in compu- 
tational geometry and in numerous other fields. Exponential search extends binary search to unbounded lists. The 


binary search tree and B-tree data structures are based on binary search. 


Algorithm 


Binary search works on sorted arrays. Binary search be- 
gins by comparing an element in the middle of the array 
with the target value. If the target value matches the el- 
ement, its position in the array is returned. If the target 
value is less than the element, the search continues in 
the lower half of the array. If the target value is greater 
than the element, the search continues in the upper half 
of the array. By doing this, the algorithm eliminates the 
half in which the target value cannot lie in each itera- 
tion.!© 


Procedure 


Given an array A of n elements with values or records 
Ag, Ay, A2,.,An—1 sorted such that Aj < Ay S An S 
++ < A,_1, and target value T, the following subroutine 
uses binary search to find the index of T in A.!© 
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1. SetLtoOandRton—1. 
2. While L < R, 

1. Set m (the position of the middle element) to 
the floor of a which is the greatest integer 
less than or equal to =. 

2. If Am <T,setLtom+1. 

3. IfA, > T,setRtom—1. 

4. 


3. If the search has not returned a value by the time 
While L > R, the search terminates as unsuccessful. 


Else, A,, = T; return m. 


This iterative procedure keeps track of the search 
boundaries with the two variables L and R. The proce- 
dure may be expressed in pseudocode as follows, where 
the variable names and types remain the same as 
above, floor is the floor function, and unsuccessful 
refers to a specific value that conveys the failure of the 
search.!6 


fC) 


function binary search(A, n, T): 


L := 0 
I B= ag = IL 
while L <= R: 
ml S=) aeikeeue (ih 4b RH) // 2) 
aise JAN |fianl] << YS 
iy SS nl Ge IL 
else if A[m] >T 
R:=m- 1 
else: 


return m 
return unsuccessful 


Alternatively, the algorithm may take the ceiling of =, 


or the least integer greater than or equal to = This 


may change the result if the target value appears more 
than once in the array. 


Alternative procedure 


In the above procedure, the algorithm checks whether 
the middle element (m) is equal to the target (T) in 
every iteration. Some implementations leave out this 
check during each iteration. The algorithm would per- 
form this check only when one element is left (when 
L = R). This results in a faster comparison loop, as one 
comparison is eliminated per iteration. However, it re- 
quires one more iteration on average." 


Hermann Bottenbruch published the first implementa- 
tion to leave out this check in 1962.1) 

1. SetLtoOandRton—1. 

2. While L + R, 

1. Set m (the position of the middle element) to 
the ceiling of = which is the least integer 
greater than or equal to = 

2. If Am > T,setR tom -— 1. 

3. Else A, < T,setL tom. 


3. Now L = R, the search is done. If A, = T, return L. 
Otherwise, the search terminates as unsuccessful. 


Where ceil is the ceiling function, the pseudocode for 
this version is: 


function binary search alternative(A, n, T): 


L := 0 
1 Sigh = Al 
while L != R: 
il B=) roubil (ih ae iy) // 2) 
ee JA\al| Ss Ale 
i SS i = AL 
else: 
Ls:i=m 
if A[L] ==T: 
return L 


return unsuccessful 
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Duplicate elements 


The procedure may return any index whose element is 
equal to the target value, even if there are duplicate el- 
ements in the array. For example, if the array to be 
searched was [1,2,3,4,4,5,6,7] and the target was 4, 
then it would be correct for the algorithm to either re- 
turn the 4th (index 3) or 5th (index 4) element. The reg- 
ular procedure would return the 4th element (index 3). 
However, it is sometimes necessary to find the leftmost 
element or the rightmost element for a target value 
that is duplicated in the array. Inthe above example, the 
4th element is the leftmost element of the value 4, 
while the 5th element is the rightmost element of the 
value 4. The alternative procedure above will always re- 
turn the index of the rightmost element if such an ele- 
ment exists." 


Procedure for finding the leftmost element 


To find the leftmost element, the following procedure 
can be used:!*! 


1. SetLtoOand R ton. 
2. While L < R, 
1. Set m (the position of the middle element) to 
the floor of = which is the greatest integer 


L+R 
less than or equal to a 


2. If Am <T,setLtom+1. 
3. Else A,, = T, set R tom. 
3. Return L. 


lf L<nand A, =T, then A, is the leftmost element 
that equals T. Even if T is not in the array, L is the rank 
of T inthe array, or the number of elements in the array 
that are less than T. 


Where floor is the floor function, the pseudocode for 
this version is: 


function binary search rightmost (A, n, T): 
L := 0 
Ri=n 
while L < R: 
im = selleyone(( (i, 4b ied) 7/ 2) 
che JN |p| 22 be 
R:i=m 
else: 
Es=m + i 
return L - 


Hh 


Procedure for finding the rightmost element 


To find the rightmost element, the following procedure 
can be used:'*! 


fC) 


1. SetLto0 and R ton. 
2. While L < R, 


1. Set m (the position of the middle element) to 


the floor of = which is the greatest integer 


L+R 
less than or equal to = 


2. If A, > T, set R tom. 
3. Else A,, <T,setLtom+1. 
3. Return Ll — 1. 


lf L > 0 and A,;_, =T, then A;_, is the rightmost ele- 
ment that equals T. Even if T is not in the array, 
(n — 1) — Lis the number of elements in the array that 
are greater than T. 


Where floor is the floor function, the pseudocode for 
this version is: 


function binary_search_rightmost(A, n, T): 
L := 0 
IR 2 n 
while L < R: 
Wm Sa re lleyene(( (fy de ie) jf 2) 
sie IN|] 2 Wes 
R i= m 
else: 
iy SS il ge Al 
return L - 1 


Approximate matches 


The above procedure only performs exact matches, 
finding the position of a target value. However, it is triv- 
ial to extend binary search to perform approximate 
matches because binary search operates on sorted ar- 
rays. For example, binary search can be used to com- 
pute, for a given value, its rank (the number of smaller 
elements), predecessor (next-smallest element), suc- 
cessor (next-largest element), and nearest neighbor. 
Range queries seeking the number of elements be- 
tween two values can be performed with two rank que- 
ries. [01 


e Rank queries can be performed with the procedure 
for finding the leftmost element. The number of el- 
ements less than the target value is returned by the 
procedure,°} 


e Predecessor queries can be performed with rank 
queries. If the rank of the target value is r, its prede- 
cessor is r — 1.[1) 


e For successor queries, the procedure for finding the 
rightmost element can be used. If the result of run- 
ning the procedure for the target value is 7, then the 
successor of the target value is r + 1.2) 


3 of 13 | WikiJournal of Science 


WikiJournal of Science, 2019, 2(1):5 
doi: 10.15347/wjs/2019.005 
Encyclopedic Review Article 


Target value = 5 


Predecessor / Nearest neighbor 
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7/8 


10] 11 | 13 


14| 15 


—— 
Rank = 4 | 


Successor 


Figure 1| Binary search can be adapted to compute approxi- 
mate matches. In the example above, the rank, predecessor, 
successor, and nearest neighbor are shown for the target 
value 5, which is not in the array. 


e The nearest neighbor of the target value is either its 
predecessor or successor, whichever is closer. 


e Range queries are also straightforward." Once the 
ranks of the two values are known, the number of 
elements greater than or equal to the first value and 
less than the second is the difference of the two 
ranks. This count can be adjusted up or down by one 
according to whether the endpoints of the range 
should be considered to be part of the range and 
whether the array contains entries matching those 
endpoints. !7) 


Performance 


Number of comparisons 


In terms of the number of comparisons, the perfor- 
mance of binary search can be analyzed by viewing the 
run of the procedure on a binary tree. The root node of 
the tree is the middle element of the array. The middle 
element of the lower half is the left child node of the 
root, and the middle element of the upper half is the 
right child node of the root. The rest of the tree is built 
inasimilar fashion. Starting from the root node, the left 
or right subtrees are traversed depending on whether 
the target value is less or more than the node under 
consideration. 517) 


In the worst case, binary search makes |log,(n) + 1] it- 
erations of the comparison loop, where the | | nota- 
tion denotes the floor function that yields the greatest 
integer less than or equal to the argument, and log, is 
the binary logarithm. This is because the worst case is 
reached when the search reaches the deepest level of 
the tree, and there are always [log,(n) + 1] levels in 
the tree for any binary search. 


The worst case may also be reached when the target el- 
ement is not in the array. If n is one less than a power of 


two, then this is always 
the case. Otherwise, the 
search may perform 
[log,(m) + 1] iterations 
if the search reaches the 
deepest level of the 
tree. However, it may 
make [log,(n)]  itera- 
tions, which is one less 
than the worst case, if 
the search ends at the 
second-deepest level of 
the tree.4) 


Figure 2 | A tree representing bi- 
nary search. The array being 
searched here is [20, 30, 40, 50, 
80, 90, 100], and the target 
value is 40. 

On average, assuming 


that each element is equally likely to be searched, bi- 
nary search makes |[log,(n)| + 1 — (2!082@l+1 — 
[log,(n)] — 2)/n iterations when the target element is 
in the array. This is approximately equal to log, (n) - 1 
iterations. When the target element is not in the array, 
binary search makes |log(n)| + 2 — 2Hos2@l#1 7(m + 
1) iterations on average, assuming that the range be- 
tween and outside elements is equally likely to be 
searched." 


In the best case, where the target value is the middle el- 
ement of the array, its position is returned after one it- 
eration. "75! 


In terms of iterations, no search algorithm that works 
only by comparing elements can exhibit better average 
and worst-case performance than binary search. The 
comparison tree representing binary search has the 
fewest levels possible as every level above the lowest 
level of the tree is filled completely.!! Otherwise, the 
search algorithm can eliminate few elements in an iter- 
ation, increasing the number of iterations required in 
the average and worst case. This is the case for other 
search algorithms based on comparisons, as while they 
may work faster on some target values, the average 
performance over all elements is worse than binary 


Best case (1 ——@. n=15 
a ~ 


a 


(3) 


an or i case 


WikiJournal of Science, 2019, 2(1):5 oy ? 
doi: 10.15347/wjs/2019.005 a Om 
Encyclopedic Review Article “Mit 


search. By dividing the array in half, binary search en- 
sures that the size of both subarrays are as similar as 
possible. 73! 


Space complexity 


Binary search requires three pointers to elements, 
which may be array indices or pointers to memory loca- 
tions, regardless of the size of the array. However, it re- 
quires at least [log,() bits to encode a pointer to an 
element of an array with n elements."® Therefore, the 
space complexity of binary search is O(log n). In addi- 
tion, it takes O(n) space to store the array. 


Derivation of average case 


The average number of iterations performed by binary 
search depends on the probability of each element be- 
ing searched. The average case is different for success- 
ful searches and unsuccessful searches. It will be as- 
sumed that each element is equally likely to be 
searched for successful searches. For unsuccessful 
searches, it will be assumed that the intervals between 
and outside elements are equally likely to be searched. 
The average case for successful searches is the number 
of iterations required to search every element exactly 
once, divided by n, the number of elements. The aver- 
age case for unsuccessful searches is the number of it- 
erations required to search an element within every in- 
terval exactly once, divided by the n + 1 intervals." 


Successful searches 


In the binary tree representation, a successful search 
can be represented by a path from the root to the target 
node, called an internal path. The length of a path is the 
number of edges (connections between nodes) that the 
path passes through. The number of iterations per- 
formed by a search, given that the corresponding path 
has length J, is 1 + 1 counting the initial iteration. The 
internal path length is the sum of the lengths of all 


ND 


2 Na 


(Wort su ©) J » @ + ey " 


Figure 3 | The worst case is reached when the search reaches the deepest level of the tree, 
while the best case is reached when the target value is the middle element. 
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unique internal paths. Since there is only one path from 
the root to any single node, each internal path repre- 
sents a search for a specific element. If there are n ele- 
ments, which is a positive integer, and the internal path 


length is /(n) then the average number of iterations for 


I(n) i 


a successful search T(n) = 1+ ae with the one itera- 


tion added to count the initial iteration.“ 


Since binary search is the optimal algorithm for search- 
ing with comparisons, this problem is reduced to calcu- 
lating the minimum internal path length of all binary 
trees with n nodes, which is equal to:!77! 


I(n) =) [log2(k) 
k=1 


For example, in a 7-element array, the root requires one 
iteration, the two elements below the root require two 
iterations, and the four elements below require three it- 
erations. In this case, the internal path length is: 


7 
Y log. (4) = 0 +2(1) + 4(2) 
k=1 


=2+8 
= 10 


The average number of iterations would be 1 + ~ = 2= 


based on the equation for the average case. The sum for 
I(n) can be simplified to:"! 


n 


1(n) = ) loge (| 
k=1 


= (n + 1)[logz(n + 1)| — 2legam+DI41 4 2 


Substituting the equation for [(m) into the equation for 
T(n):1 


n+1)[log,(n +1)] — 2log2@+D1+1 4 2 
Fe et aR Al 


= |log2(n)|+1—- (glioasei+1 — [logz(n)] — 2)/n 


For integer n, this is equivalent to the equation for the 
average case ona successful search specified above. 


Unsuccessful searches 


Unsuccessful searches can be represented by augment- 
ing the tree with external nodes, which forms an ex- 
tended binary tree. If an internal node, or a node present 
in the tree, has fewer than two child nodes, then addi- 
tional child nodes, called external nodes, are added so 
that each internal node has two children. By doing so, 
an unsuccessful search can be represented as a path to 
an external node, whose parent is the single element 
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that remains during the last iteration. An external path 
is a path from the root to an external node. The external 
path length is the sum of the lengths of all unique exter- 
nal paths. If there are n elements, which is a positive in- 
teger, and the external path length is E (1), then the av- 
erage number of iterations for an unsuccessful search 

! E(n) 
T (n) = nei 
initial iteration. The external path length is divided by 
n+ 1 instead of n because there are n +1 external 
paths, representing the intervals between and outside 
the elements of the array."?! 


with the one iteration added to count the 


This problem can similarly be reduced to determining 
the minimum external path length of all binary trees 
with 7 nodes. For all binary trees, the external path 
length is equal to the internal path length plus 2n."7! 
Substituting the equation for I(n):"?! 

E(n) = 1(n) + 2n 

[(n + 1)[loga(n + 1)| — 2!eg2™*DI44 4 2] + 2n 
= (n+ 1)([log,(n)| + 2) — 2ltog2@i+1 


Substituting the equation for E(n) into the equation for 
T'(n), the average case for unsuccessful searches can 
be determined: 


_ (n+ 1) (log, (n)| + 2) — 2lleg2@ol+1 
(n+ 1) 
= [log,(n)] + 2 — 2llosaMI+1/(m + 1) 


T'(n) 


Performance of alternative procedure 


Each iteration of the binary search procedure defined 
above makes one or two comparisons, checking if the 
middle element is equal to the target in each iteration. 
Assuming that each element is equally likely to be 
searched, each iteration makes 1.5 comparisons on av- 
erage. A variation of the algorithm checks whether the 
middle element is equal to the target at the end of the 
search. On average, this eliminates half a comparison 
from each iteration. This slightly cuts the time taken per 
iteration on most computers. However, it guarantees 
that the search takes the maximum number of itera- 
tions, on average adding one iteration to the search. Be- 
cause the comparison loop is performed only 
[log,(n) + 1] times in the worst case, the slight in- 
crease in efficiency per iteration does not compensate 
for the extra iteration for all but very large n. (181091 


Running time and cache use 


In analyzing the performance of binary search, another 
consideration is the time required to compare two ele- 
ments. For integers and strings, the time required in- 
creases linearly as the encoding length (usually the 


6) 


number of bits) of the elements increase. For example, 
comparing a pair of 64-bit unsigned integers would re- 
quire comparing up to double the bits as comparing a 
pair of 32-bit unsigned integers. The worst case is 
achieved when the integers are equal. This can be sig- 
nificant when the encoding lengths of the elements are 
large, such as with large integer types or long strings, 
which makes comparing elements expensive. Further- 
more, comparing floating-point values (the most com- 
mon digital representation of real numbers) is often 
more expensive than comparing integers or short 
strings. 


On most computer architectures, the processor has a 
hardware cache separate from RAM. Since they are lo- 
cated within the processor itself, caches are much faster 
to access but usually store much less data than RAM. 
Therefore, most processors store memory locations 
that have been accessed recently, along with memory 
locations close to it. For example, when an array ele- 
ment is accessed, the element itself may be stored 
along with the elements that are stored close to it in 
RAM, making it faster to sequentially access array ele- 
ments that are close in index to each other (locality of 
reference). Ona sorted array, binary search can jump to 
distant memory locations if the array is large, unlike al- 
gorithms (such as linear search and linear probing in 
hash tables) which access elements in sequence. This 
adds slightly to the running time of binary search for 
large arrays on most systems.!7°! 


Binary search versus other schemes 


Sorted arrays with binary search are a very inefficient 
solution when insertion and deletion operations are in- 
terleaved with retrieval, taking O(n) time for each such 
operation. In addition, sorted arrays can complicate 
memory use especially when elements are often in- 
serted into the array.” There are other data structures 
that support much more efficient insertion and dele- 
tion. Binary search can be used to perform exact match- 
ing and set membership (determining whether a target 
value is in a collection of values). There are data struc- 
tures that support faster exact matching and set mem- 
bership. However, unlike many other searching 
schemes, binary search can be used for efficient approx- 
imate matching, usually performing such matches in 
O(log n) time regardless of the type or structure of the 
values themselves.!22! In addition, there are some oper- 
ations, like finding the smallest and largest element, 
that can be performed efficiently on a sorted array.2°! 
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Linear search 


Linear search is a simple search algorithm that checks 
every record until it finds the target value. Linear search 
can be done on a linked list, which allows for faster in- 
sertion and deletion than an array. Binary search is 
faster than linear search for sorted arrays except if the 
array is short, although the array needs to be sorted be- 
forehand.'“!*! All sorting algorithms based on compar- 
ing elements, such as quicksort and merge sort, require 
at least O(nlogn) comparisons in the worst case.'°! 
Unlike linear search, binary search can be used for effi- 
cient approximate matching. There are operations such 
as finding the smallest and largest element that can be 
done efficiently on a sorted array but not on an un- 
sorted array.!?° 


Linear search 


Linear search is a simple search algorithm that checks 
every record until it finds the target value. Linear search 
can be done on a linked list, which allows for faster in- 
sertion and deletion than an array. Binary search is 
faster than linear search for sorted arrays except if the 
array is short, although the array needs to be sorted be- 
forehand. !'oweralpha 31124] All sorting algorithms based on 
comparing elements, such as quicksort and merge sort, 
require at least O(nlogn) comparisons in the worst 
case.’°! Unlike linear search, binary search can be used 
for efficient approximate matching. There are opera- 
tions such as finding the smallest and largest element 
that can be done efficiently on a sorted array but not on 
an unsorted array. 75 


Trees 


A binary search tree is a binary tree data structure that 
works based on the principle of binary search. The rec- 
ords of the tree are arranged in sorted order, and each 
record in the tree can be searched using an algorithm 
similar to binary search, taking on average logarithmic 
time. Insertion and deletion also require on average log- 
arithmic time in binary search trees. This can be faster 
than the linear time insertion and deletion of sorted ar- 
rays, and binary trees retain the ability to perform all 
the operations possible on a sorted array, including 
range and approximate queries. !77127! 


However, binary search is usually more efficient for 
searching as binary search trees will most likely be im- 
perfectly balanced, resulting in slightly worse perfor- 
mance than binary search. This even applies to bal- 
anced binary search trees, binary search trees that bal- 
ance their own nodes, because they rarely produce the 
tree with the fewest possible levels. Except for balanced 


Figure 4 | Binary search trees are searched using 
an algorithm similar to binary search. 
Chris Martin, public domain 


binary search trees, the tree may be severely imbal- 
anced with few internal nodes with two children, result- 
ing in the average and worst-case search time ap- 
proaching n comparisons.) Binary search trees take 
more space than sorted arrays. !?°! 


Binary search trees lend themselves to fast searching in 
external memory stored in hard disks, as binary search 
trees can be efficiently structured in filesystems. The B- 
tree generalizes this method of tree organization. B- 
trees are frequently used to organize long-term storage 
such as databases and filesystems.2°F1) 


Hashing 


For implementing associative arrays, hash tables, a 
data structure that maps keys to records using a hash 
function, are generally faster than binary search on a 
sorted array of records.°*! Most hash table implemen- 
tations require only amortized constant time on aver- 
age.'<!541 However, hashing is not useful for approxi- 
mate matches, such as computing the next-smallest, 
next-largest, and nearest key, as the only information 
given on a failed search is that the target is not present 
in any record.?*! Binary search is ideal for such matches, 
performing them in logarithmic time. Binary search also 
supports approximate matches. Some operations, like 
finding the smallest and largest element, can be done 
efficiently on sorted arrays but not on hash tables.'?2! 


Set membership algorithms 


A related problem to search is set membership. Any al- 
gorithm that does lookup, like binary search, can also 
be used for set membership. There are other algorithms 
that are more specifically suited for set membership. A 
bit array is the simplest, useful when the range of keys 
is limited. It compactly stores a collection of bits, with 
each bit representing a single key within the range of 


7 of 13 | WikiJournal of Science 


WikiJournal of Science, 2019, 2(1):5 
doi: 10.15347/wjs/2019.005 
Encyclopedic Review Article 


keys. Bit arrays are very fast, requiring only O(1) 
time.°*! The Judy1 type of Judy array handles 64-bit 
keys efficiently.°7! 


For approximate results, Bloom filters, another proba- 
bilistic data structure based on hashing, store a set of 
keys by encoding the keys using a bit array and multiple 
hash functions. Bloom filters are much more space-effi- 
cient than bit arrays in most cases and not much slower: 
k hash functions, membership queries require only 
O(k) time. However, Bloom filters suffer from false 
positives. lgIb9] 


Other data structures 


There exist data structures that may improve on binary 
search in some cases for both searching and other op- 
erations available for sorted arrays. For example, 
searches, approximate matches, and the operations 
available to sorted arrays can be performed more effi- 
ciently than binary search on specialized data structures 
such as van Emde Boas trees, fusion trees, tries, and bit 
arrays. These specialized data structures are usually 
only faster because they take advantage of the proper- 
ties of keys with a certain attribute (usually keys that are 
small integers), and thus will be time or space consum- 
ing for keys that lack that attribute.'?7) As long as the 
keys can be ordered, these operations can always be 
done at least efficiently on a sorted array regardless of 
the keys. Some structures, such as Judy arrays, use a 
combination of approaches to mitigate this while re- 
taining efficiency and the ability to perform approxi- 
mate matching.°7! 


Variations 


Uniform binary search 
Main article: Uniform binary search 


Uniform binary search stores, instead of the lower and 
upper bounds, the difference in the index of the middle 
element from the current iteration to the next iteration. 
A lookup tablecontaining the differences is computed 
beforehand. For example, if the array to be searched is 


Middle element 


BBE ) | © || 7 || & || © |] 2] ti 


—————————————s 


6=3 6=3 


Figure 5 | Uniform binary search stores the difference be- 
tween the current and the two next possible middle elements 
instead of specific bounds. 


Target 
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Figure 6 | Visualization of exponential searching finding the upper bound for the subsequent binary search. 


[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], the middle element (m) 
would be 6. In this case, the middle element of the left 
subarray ([1, 2, 3, 4, 5]) is 3 and the middle element of 
the right subarray ([7, 8, 9, 10, 11]) is 9. Uniform binary 
search would store the value of 3 as both indices differ 
from 6 by this same amount."®! To reduce the search 
space, the algorithm either adds or subtracts this 
change from the index of the middle element. Uniform 
binary search may be faster on systems where it is inef- 
ficient to calculate the midpoint, such as on decimal 
computers.) 


Exponential search 
Main article: Exponential search 


Exponential search extends binary search to un- 
bounded lists. It starts by finding the first element with 
an index that is both a power of two and greater than 
the target value. Afterwards, it sets that index as the up- 
per bound, and switches to binary search. A search 
takes [log,x +1] iterations before binary search is 
started and at most |log, x] iterations of the binary 
search, where x is the position of the target value. Ex- 
ponential search works on bounded lists, but becomes 
an improvement over binary search only if the target 
value lies near the beginning of the array. !*7! 


Interpolation search 


Main article: Interpolation search 


Instead of calculating the midpoint, interpolation 
search estimates the position of the target value, taking 
into account the lowest and highest elements in the ar- 
ray as well as length of the array. It works on the basis 


Lower bound (I) 


that the midpoint is not the best guess in many cases. 
For example, if the target value is close to the highest 
element in the array, it is likely to be located near the 
end of the array."**! 


Acommon interpolation function is linear interpolation. 
If A is the array, L,R are the lower and upper bounds re- 
spectively, and T is the target, then the target is esti- 
mated to be about (T — A,)/(Ap — A,) of the way be- 
tween L and R. When linear interpolation is used, and 
the distribution of the array elements is uniform or near 
uniform, interpolation search makes O (log log n) com- 
parisons. 4314411451 


In practice, interpolation search is slower than binary 
search for small arrays, as interpolation search requires 
extra computation. Its time complexity grows more 
slowly than binary search, but this only compensates 
for the extra computation for large arrays.) 


Fractional cascading 
Main article: Fractional cascading 


Fractional cascading is a technique that speeds up bi- 
nary searches for the same element in multiple sorted 
arrays. Searching each array separately requires 
O(k log n) time, where k is the number of arrays. Frac- 
tional cascading reduces this to O(k + logn) by storing 
specific information in each array about each element 
and its position in the other arrays. 6l47] 


Fractional cascading was originally developed to effi- 
ciently solve various computational geometry prob- 
lems. Fractional cascading has been applied elsewhere, 
such as in data mining and Internet Protocol routing.*! 


Target (t) 
| 


Upper bound (u) 


fa 2] 4 |e] z [1] 14] 18] 21 | 25 | 28 | 30 | 32 | 35 | 37 


Estimate: (t —1)/(u — 1) = (25 — 1)/(37 — 1) = 24/36 = 10/15 


Figure 7 | Visualization of interpolation search. In this case, no searching is needed because the estimate of 
the target's location within the array is correct. Other implementations may specify another function for esti- 


mating the target's location. 
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Array 1 (Blue): 1, 5, 12, 16, 22, 24 
Array 2 (Yellow): 2, 8, 13, 17 
Array 3 (Pink): 4, 7, 9, 15 
n = 14 (number of elements) 
k = 3 (number of arrays) 


Figure 8 | In fractional cascading, each array has pointers to every second element of another array, so only one 


binary search has to be performed to search all the arrays. 


Generalization to graphs 


Binary search has been generalized to work on certain 
types of graphs, where the target value is stored in a 
vertex instead of an array element. Binary search trees 
are one such generalization—when a vertex (node) in 
the tree is queried, the algorithm either learns that the 
vertex is the target, or otherwise which subtree the tar- 
get would be located in. However, this can be further 
generalized as follows: given an undirected, positively 
weighted graph and a target vertex, the algorithm 
learns upon querying a vertex that it is equal to the tar- 
get, or it is given an incident edge that is on the shortest 
path from the queried vertex to the target. The stand- 
ard binary search algorithm is simply the case where the 
graph is a path. Similarly, binary search trees are the 
case where the edges to the left or right subtrees are 
given when the queried vertex is unequal to the target. 
For all undirected, positively weighted graphs, there is 
an algorithm that finds the target vertex in O(log n) 
queries in the worst case."*®! 


Noisy binary search 


Noisy binary search algorithms solve the case where the 
algorithm cannot reliably compare elements of the ar- 
ray. For each pair of elements, there is a certain proba- 
bility that the algorithm makes the wrong comparison. 
Noisy binary search can find the correct position of the 
target with a given probability that controls the reliabil- 


ity of the yielded position. Every noisy binary search 


1 
procedure must make at least G-ple@ 


H(p) H(p) 
. 1 ; 
comparisons on average, where (1 — Tt) ron — _ 


the binary entropy function and Tt is the probability that 
the procedure yields the wrong position.4“95°51] The 
noisy binary search problem can be considered as a case 
of the Rényi-Ulam game,'”! a variant of Twenty Ques- 
tions where the answers may be wrong.) 
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Quantum binary search 


Classical computers are bounded to the worst case of 
exactly [logz n + 1] iterations when performing binary 
search. Quantum algorithms for binary search are still 
bounded toa proportion of log, n queries (representing 
iterations of the classical procedure), but the constant 
factor is less than one, providing for a lower time com- 
plexity on quantum computers. Any exact quantum bi- 
nary search procedure—that is, a procedure that always 
yields the correct result—requires at least ~(Inn = 


1) = 0.22 log, n queries in the worst case, where In is 
the natural logarithm.®“! There is an exact quantum bi- 
nary search procedure that runs in 4logegg,n ~ 
0.433 log, n queries in the worst case.©*! In compari- 
son, Grover's algorithm is the optimal quantum algo- 
rithm for searching an unordered list of elements, and it 
requires O(Vn) queries. 


History 


The idea of sorting a list of items to allow for faster 
searching dates back to antiquity. The earliest known 
example was the Inakibit-Anu tablet from Babylon da- 
ting back to c. 200 BCE. The tablet contained about 500 
sexagesimalnumbers and their reciprocals sorted in lex- 
icographical order, which made searching for a specific 
entry easier. In addition, several lists of names that 
were sorted by their first letter were discovered on the 
Aegean Islands. Catholicon, a Latin dictionary finished 
in 1286 CE, was the first work to describe rules for sort- 
ing words into alphabetical order, as opposed to just the 
first few letters." 


In 1946, John Mauchly made the first mention of binary 
search as part of the Moore School Lectures, a seminal 
and foundational college course in computing.” In 
1957, William Wesley Peterson published the first 


CC) 


method for interpolation search.!671 Every published 
binary search algorithm worked only for arrays whose 
length is one less than a power of two!) until 1960, 
when Derrick Henry Lehmer published a binary search 
algorithm that worked on all arrays.©?! In 1962, Her- 
mann Bottenbruch presented an ALGOL 60 implemen- 
tation of binary search that placed the comparison for 
equality at the end, increasing the average number of 
iterations by one, but reducing to one the number of 
comparisons per iteration.) The uniform binary search 
was developed by A. K. Chandra of Stanford University 
in 1971."! In 1986, Bernard Chazelle and Leonidas J. 
Guibas introduced fractional cascading as a method to 
solve numerous search problems in computational ge- 
omet ry. [46][60][61] 


Implementation issues 


Although the basic idea of binary search is comparatively 
straightforward, the details can be surprisingly tricky ... — Don- 
ald Knuth?! 


When Jon Bentley assigned binary search as a problem 
inacourse for professional programmers, he found that 
ninety percent failed to provide a correct solution after 
several hours of working on it, mainly because the in- 
correct implementations failed to run or returned a 
wrong answer in rare edge cases.'©! A study published 
in 1988 shows that accurate code for it is only found in 
five out of twenty textbooks.!°?! Furthermore, Bentley's 
own implementation of binary search, published in his 
1986 book Programming Pearls, contained an overflow 
error that remained undetected for over twenty years. 
The Java programming language library implementa- 
tion of binary search had the same overflow bug for 
more than nine years.'°*! 


Ina practical implementation, the variables used to rep- 
resent the indices will often be of fixed size, and this can 
result in an arithmetic overflow for very large arrays. If 


the midpoint of the span is calculated as =~ then the 


value of L + R may exceed the range of integers of the 
data type used to store the midpoint, even if L and R are 
within the range. If L and R are nonnegative, this can be 
Raf (65 


avoided by calculating the midpoint as L + > 


An infinite loop may occur if the exit conditions for the 
loop are not defined correctly. Once L exceeds R, the 
search has failed and must convey the failure of the 
search. In addition, the loop must be exited when the 
target element is found, or in the case of an implemen- 
tation where this check is moved to the end, checks for 
whether the search was successful or failed at the end 
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must be in place. Bentley found that most of the pro- 
grammers who incorrectly implemented binary search 
made an error in defining the exit conditions.7116°! 


Library support 


Many languages’ standard libraries include binary 
search routines: 


e C provides the function bsearch () in its standard li- 
brary, which is typically implemented via binary 
search, although the official standard does not re- 
quire it so. ©! 


e C++'s Standard Template Library provides the func- 
tions 
per_ bound () and equal range 


binary search(), lower _bound(), up- 


.'** 


e COBOL provides the szaRcH ALL verb for perform- 
ing binary searches on COBOL ordered tables.!©°! 


e Go's sort standard library package contains the 
functions Search, SearchInts, SearchFloat64s, and 
SearchStrings, which implement general binary 
search, as well as specific implementations for 
searching slices of integers, floating-point numbers, 
and strings, respectively.!””! 


e Java offers a set of overloaded binarySearch () 
static methods in the classes Arrays and Collec- 
tionsin the standard java.util package for per- 
forming binary searches on Java arrays and on 
Lists, respectively.!711l721 


e Microsoft's .NET Framework 2.0 offers static ge- 
neric versions of the binary search algorithm in its 
collection base classes. An example would be sys- 
tem.Array's method BinarySearch<T>(T[] ar- 


heahy, IE value) .!71 


e For Objective-C, the Cocoa framework provides the 
NSArray -indexOfObject:inSortedRange:op- 


tions:usingComparator: method in Mac OS X 
10.6+.!*1 Apple's Core Foundation C framework also 
contains a CFArrayBSearchValues () function.!”*! 


e Python provides the bisect module.” 


e Ruby's Array class includes a bsearch method with 
built-in approximate matching.'”7! 


fC) 
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Notes 


a. Any search algorithm based solely on comparisons 
can be represented using a binary comparison tree. 
An internal path is any path from the root to an ex- 
isting node. Let l be the internal path length, the 
sum of the lengths of all internal paths. If each ele- 
ment is equally likely to be searched, the average 
caseis 1+ “or simply one plus the average of all the 


internal path lengths of the tree. This is because in- 
ternal paths represent the elements that the search 
algorithm compares to the target. The lengths of 
these internal paths represent the number of itera- 
tions after the root node. Adding the average of 
these lengths to the one iteration at the root yields 
the average case. Therefore, to minimize the aver- 
age number of comparisons, the internal path 
length / must be minimized. It turns out that the tree 
for binary search minimizes the internal path length. 
Knuth 1998 proved that the external path length 
(the path length over all nodes where both children 
are present for each already-existing node) is mini- 
mized when the external nodes (the nodes with no 
children) lie within two consecutive levels of the 
tree. This also applies to internal paths as internal 
path length I is linearly related to external path 
length . For any tree of n nodes, J = E — 2n. 
When each subtree has a similar number of nodes, 
or equivalently the array is divided into halves in 
each iteration, the external nodes as well as their in- 
terior parent nodes lie within two levels. It follows 
that binary search minimizes the number of average 
comparisons as its comparison tree has the lowest 
possible internal path length.” 


b. Knuth 1998 showed on his MIX computer model, 
which Knuth designed as a representation of an or- 
dinary computer, that the average running time of 
this variation for a successful search is 17.5 logs n + 
17 units of time compared to 18 log, n — 16 units 
for regular binary search. The time complexity for 
this variation grows slightly more slowly, but at the 
cost of higher initial complexity. "®! 
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c. Knuth 1998 performed a formal time performance 
analysis of both of these search algorithms. On 
Knuth's MIX computer, which Knuth designed as a 
representation of an ordinary computer, binary 
search takes on average 18 log n — 16 units of time 
for a successful search, while linear search with a 


sentinel node at the end of the list takes 1.75n + 
85 — nmod 2 


complexity because it requires minimal computa- 
tion, but it quickly outgrows binary search in com- 
plexity. On the MIX computer, binary search only 


outperforms linear search with a sentinel if n > 
44,.(131123] 


units. Linear search has lower initial 


d. Inserting the values in sorted order or in an alternat- 
ing lowest-highest key pattern will result in a binary 
search tree that maximizes the average and worst- 
case search time.) 


e. It is possible to search some hash table implemen- 
tations in guaranteed constant time.?! 


f. This is because simply setting all of the bits which 
the hash functions point to for a specific key can af- 
fect queries for other keys which have a common 
hash location for one or more of the functions." 


g. There exist improvements of the Bloom filter which 
improve on its complexity or support deletion; for 
example, the cuckoo filter exploits cuckoo hashing 
to gain these advantages." 


h. That is, arrays of length 1, 3, 7, 15, 31...5°! 
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